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Problems  studied 


1.  Asymptotically  consistent  methods  for  variable  selection  in  linear  re¬ 
gression  models. 

2.  Construction  of  second-order  efficient  bootstrap  confidence  intervals  for 
a  binomial  proportion. 

3.  Tree-structured  statistical  methods  for  classification  and  nonlinear  func¬ 
tion  estimation  by  recursive  partitioning. 

4.  Asymptotic  efficiency  of  the  f-test  following  Box-Cox  transformations 
and  its  effect  on  linear  discriminant  analysis. 

5.  Identification  of  significant  contrasts  in  unreplicated  two-level  factorial 
experiments. 

6.  Bounds  on  the  asymptotic  size  of  the  likelihood  ratio  test  of  indepen¬ 
dence  in  a  two-way  contingency  table. 

2  Summary  of  important  results 

The  following  results  were  obtained  in  each  of  the  problems  listed  above. 

References  refer  to  the  list  of  publications  in  Section  3. 

1.  There  are  numerous  methods  for  selecting  the  correct  variables  to  use  in 
a  linear  model.  Well-known  procedures  include  Mallows’  Cp,  Akaike’s 
AIC,  and  cross-validation.  It  is  shown  in  [9]  that  all  of  these  methods 
are  inconsistent  in  the  sense  that  the  probability  of  correct  selection 
does  not  converge  to  unity  as  the  sample  size  tends  to  infinity.  A 
method  that  employs  a  heavier  model  complexity  penalty  is  proposed 
and  proved  to  be  consistent. 

Bootstrap  methods  that  attack  the  same  problem  are  developed  and  an¬ 
alyzed  in  [8]  and.  [10],  One  bootstrap  method  improves  upon  an  existing 
procedure  by  reducing  its  bias  and  variance.  A  second  method  selects  a 
model  based  on  bootstrap  estimates  of  expected  prediction  error.  This 
method  is  consistent  even  when  the  errors  are  heteroscedastic. 
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2.  A  method  is  developed  for  the  construction  of  second-order  efficient 
bootstrap  confidence  intervals  for  a  binomial  proportion.  The  method 
first  smooths  the  data  by  convolution  and  then  uses  the  Pi’s  boot¬ 
strap  calibration  method  (developed  in  1987)  to  construct  the  interval. 
Second-order  efficiency  is  proved  via  Edgeworth  expansions.  The  re¬ 
sults  are  reported  in  [11]. 

3.  Algorithms  for  tree-structured  classification  [12],  least-squares  regres¬ 
sion  [5],  generalized  linear  models  [7],  and  proportional  hazard  regres¬ 
sion  [6]  are  developed.  The  algorithms  employ  recursive  partitioning 
and  cross-validation  pruning.  They  are  compared  with  existing  tree- 
structured  and  non-tree-structured  methods  using  real  and  simulated 
data  sets.  The  results  show  that  the  new  algorithms  are  as  accurate 
as  all  the  existing  methods.  They  are  fastest  among  tree-structured 
methods,  with  speed  superiority  up  to  several  hundred  times  faster  for 
moderate  sample  sizes.  The  speed  advantage  increases  rapidly  with 
increase  in  sample  size  and  number  of  variables.  Proofs  of  asymptotic 
consistency  of  some  of  the  methods  are  obtained.  The  computer  pro¬ 
grams  developed  for  this  project  are  freely  available  from  the  PI. 

4.  The  family  of  Box-Cox  transformations  is  well-known  to  be  a  powerful 
data  analytic  tool.  It  is  proved  in  [I]  that  the  application  of  these  trans¬ 
formations  to  the  classical  t-test  yields  a  test  with  asymptotic  relative 
efficiency  bounded  below  by  unity  for  all  data  distributions.  Hence, 
at  least  in  large  samples,  application  of  the  Box-Cox  transformations 
to  the  t-test  is  recommended.  A  similar,  though  less  striking,  result  is 
obtained  in  the  context  of  linear  discriminant  analysis  in  [3]. 

5.  A  standard  but  subjective  technique  for  the  analysis  of  data  from  two- 
level  factorial  experiments  is  Daniel’s  normal  plot  of  the  contrasts.  It  is 
shown  in  [2]  that  this  technique  is  not  invariant  of  the  labeling  of  the  fac¬ 
tor  levels.  A  method  that  does  remain  invariant  and  that  identifies  the 
significant  contrasts  in  a  completely  objective  way  is  proposed.  Com¬ 
parisons  with  other  methods  show  that  the  new  method  has  equivalent 
power.  The  method  is  now  routinely  taught  in  experimental  design 
courses  at  the  University  of  Wisconsin,  Madison. 
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6.  A  long-standing  problem  in  testing  the  independence  of  the  rows  and 
columns  of  data  from  a  cross-classified  table  is  which  test  statistic  to 
use  and  with  what  minimum  observed  cell  count.  These  questions  are 
rooted  in  the  convergence  of  the  null  distribution  of  the  test  statistic 
as  the  sample  size  increases.  It  is  proved  in  [4]  that  in  the  case  of  the 
likelihood  ratio  statistic,  this  convergence  can  be  highly  non-uniform 
over  the  parameter  space.  This  shows  conclusively  that  it  is  futile  to 
devise  practical  recommendations  for  the  minimum  cell  count  to  use 
with  the  test. 
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